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Abstract 

In this article, we propose a new fast nearest neighbor search algorithm, based on vector quan- 
tization. Like many other branch and bound search algorithms |T|[T3, a preprocessing recursively 
partitions the data set into disjointed subsets until the number of points in each part is small enough. 
In doing so, a search-tree data structure is built. This preliminary recursive data-set partition is based 
on the vector quantization of the empirical distribution of the initial data-set. 

Unlike previously cited methods, this kind of partitions does not a priori allow to eliminate 
several brother nodes in the search tree with a single test. To overcome this difficulty, we propose 
an algorithm to reduce the number of tested brother nodes to a minimal list that we call "friend 
Voronoi cells". The complete description of the method requires a deeper insight into the properties 
of Delaunay triangulations and Voronoi diagrams. 
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Introduction 



The problem of nearest neighbor search, also known as the post office problem [J has been widely 
investigated in the area of computational geometry. It is encountered for many applications, as pattern 
recognition and vector quantization. 

The post-office problem has been solved near optimally for the case of low dimensions. Algorithms 
differ on their practical efficiency on real data sets. For large dimensions, most solutions have a complexity 
that grows exponentially with the dimension, or require a bigger query time than the obvious brute force 
algorithm. In fact, it has been noticed that, if n is the size of the data set and d is the dimensionality, 
the best choice becomes linear search when d > Klog{n) for some positive constant K which depends 
on the chosen algorithm. This effect is known as the curse of dimensionality. 

As concerns the application to (Voronoi) vector quantization, nearest neighbor projections are rec- 
ognized to represent the critical part of most codebook optimization algorithms. In this case, the big 
amount of nearest neighbor searches we have to do shows that a preprocessing of the data-set will be 
profitable if it reduces the average query time. Still, in some particular cases, the codebook is chosen so 
that nearest neighbor search is performed easily, (as when dealing with product quantization). Moreover, 
non- Voronoi quantization methods can also be designed in order to simplify the projection procedure 
while preserving some important properties of optimal quantizers, as the stationarity in the quadratic 
case. 

Let us also point out that a field recently emerged under the name of dual quantization [12t il3j . In 
this context, the nearest neighbor search, i.e. the location of a point in a Voronoi partition, is replaced 
by the analogous procedure in the Delaunay triangulation. This localization procedure in Delaunay 
triangulations have been widely investigated in the practical viewpoint in terms of reduction of its 
computational complexity. We refer to Devillers, Pion and Teillaud for a review on this subject [5]. 

Many nearest neighbor search algorithms rely on a recursive partitioning of the data-set resulting in 
a search-tree data structure [T1[TU]. The method proposed by McNames in [10 1 improved the classical 
Kd-tree algorithm [T] by taking advantage of the shape of the data-set thanks to principal component 
analysis. The "principal axis tree" algorithm performs much faster than the classical Kd-tree when the 
coordinates of the data-set are correlated and it seems to take better the growth of dimensionality. 

In our case, the proposed algorithm uses vector quantization as a clustering method to perform this 
recursive partitioning and to take advantage of the geometry of the data-set. It is classical background 
that when dealing with empirical distributions, the quadratic vector quantization problem is equivalent 
to the reduction of the intraclass inertia of the related partition, and the specification of the classical 
Lloyd algorithm to this case turns out to be the fc-means clustering algorithm. 

We will see that one draw-back of this kind of partition is that, as other tree-based search algorithms, 
after determining the closest neighbor of a query in a leaf-node of the tree, the procedure has to move 
up to the parent node and determine whether brother nodes have to be explored or not. Unlike Kd-tree 
and "principal axis tree", our so-called "quantization tree" can't eliminate several brother nodes by with 
a single test. This is the motivation for the development of our friend node algorithm. 

The paper is organized as follows. Section [1] is devoted to classical definitions and notations related 
to vector quantization. The link with the classification problem is pointed out. Section [5] recalls in mind 
some definitions of computational geometry which will be useful in the sequel. As both the fields of vector 
quantization and algorithmic geometry deal with the notion of Voronoi diagram, we apply ourselves to 
distinguish the corresponding definitions and notations. Section [3] makes a brief presentation of both 
the Kd-tree [Ij and "principal axis tree" [T^ algorithms. We deal with some optimizations that will be 
applicable with our quantization tree. Section 3] presents the "crude" quantization tree, i.e. without 
using any friend node algorithm. It is presented as the natural counterpart these two branch and bound 
algorithms with a quantization based partition of the data-set. Section [S] presents the friend node 
algorithm which was discussed above. Finally, the last section provides some performance comparisons 
between the different algorithms on various data-sets. 

1 Vector quantization and Voronoi tessellations 

We consider (17, A, P) a probability space and E a (real) finite dimensional Euclidean space. The principle 
of a random variable X taking its values in E is to approach X by a random variable Y taking a finite 
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number of values in E. 



Definition 1 (quantizer) . In this surrounding, the discrete random variable Y is a quantizer of X . 

If X G L^, the quantization error is the norm of |X — y|, where | • | denotes the Euchdean norm on 
E. The minimization of this error yields the following minimization problem 

min{||X - Y\\p,Y : fl ^ E measurable ,card(y(r2)) < N}. (1) 

Definition 2 (Voronoi partition). Consider N £ N* , T ~ {71, • • • , 7jv} C E and let C ~ {Ci, • • • , Cn\ 
he a Borel partition of E. C is a Voronoi partition associated with F i/ Vi G {1, ■ • • ,N}, Ci C G 
i?,|C-7.|= min |e-7,|}- 

If C = {Ci,--- ,Cn} is a Voronoi partition associated with T — {71,- •• ,Jn}, it is clear that V« G 
{1, • • • , N}, £ Ci. Ci is called Voronoi slab associated with 7^ in C and 7; is the center of the slab Ci. 
We denote Ci = slabc(7i)- For every a G F, iy(a|F) is the closed subset of E defined by VF(a|F) = 

y e E,\y - min|j/ -7] 

Definition 3 (Nearest neighbor projection). Consider T C E a finite subset of E. A nearest neighbor 
projection onto F is an application Projp that satisfies 

Vx E E, a; — Projp(a:) = min — 7|. 
To be more precise, if Projp is a measurable nearest neighbor projection onto F, there exists a Voronoi 

N 

partition C = {Ci, ■ ■ ■ , Cn} associated to F such that Projp = ^ Ji^d- 

1=1 

Proposition 1.1. Let X be an E-valued random variable, and Y taking its values in the settled point 
set F = {yi, ■ ■ ■ ,2/Ar} C E where iV G N. Set X^ the random variable defined by X^ :— Projp(X) where 
Projp is a nearest neighbor projection on T, called a Voronoi T -quantizer of X . 



Then we clearly have 



X - X'- < \X -Y\ a.s.. Hence 



X -X^ 



< \\X-Y\\p. 

p 



A consequence of this proposition is that solving the minimization problem ([T]) amounts to solving the 
simpler minimization problem 

min{|lX-Projp(X)|lp, F C card(F) < iV} . (2) 

The quantity \\X — Projp(X)||p is called the mean LP-quantization error. When this minimum is reached, 
we refer to L^-optimal quantization. 

The problem of the existence of a minimum have been investigated for decades on its numerical and 
theoretical aspects in the finite dimensional case [TTl [S]. For every > 1, the LP-quantization error 
is Lipschitz-continuous and reaches a minimum. An A^-tuple that achieves the minimum has pairwise 
distinct components, as soon as card(supp(Pjs:)) > A^. This result stands in the general case of a random 
variable valued in a reflexive Banach space [5]. If card(A(ri)) is infinite, this minimum strictly decreases 
to as A^ goes to infinity. The asymptotic rate of convergence, in the case of non singular distributions 
is ruled by the Zador theorem |5j. A non-asymptotic upper bound for the quantization error is also 
available [S]. 

We now focus on the quadratic case {p = 2). For a random variable X, we now denote Cn{X) 
the set of L^-optimal quantizers of X of level A' and cat (A) the minimal quadratic distortion that can 
be achieved when approximating A by a quantizer of level A. A quantizer y of A is stationary (or 
self-consistent) if F = E[A|y]. 

Proposition 1.2 (Stationarity of L^-optimal quantizers). A (quadratic) optimal quantizer is stationary. 

The stationarity is a particularity of the quadratic case. In other cases, a similar property involving 
the notion of p-center occurs. A proof is available in 
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Definition 4 (Centroidal projection). Let C = {Ci, • • • ,Cn} he. a Borel partition of E. Let us define 

for l<i<N,Gi= \ ^["'^l"^ ^ ifF[X £ C'j] ^ 0, centroids associated with X and C. 
to m the other case, 

N 

The centroidal projection associated C and X is the application Proj^;^ : a; — )■ ^ Gilci{x). 

i=l 

Lemma 1.3 (Huyghens, variance decomposition). Let X be a E-valued L"^ random variable, TV G N* 

N 

and C = {Ci)i<i<N a Borel partition of E. Consider Proj^^j- = J2 ^i^Ci the associated centroidal 
projection. Then one has, 

Var(X) =e[\X- Pvojc,x{X)\^] +^ [|Projc,x(^) - . 

^ V ' ^ V ' 

:=(!) : = (2) 

The variance of the probability distribution X decomposes itself as the sum of the intraclass inertia 
(1) and the interclass inertia (2). 



Proof: 



Var(X) = E 

= E 



\X - Projc,x(^) + Projc.x(^) - ] 

\X - Projc,^(X)|'] +E [|Pro,if,,v(X) - E^Y]|'] 



= (1) =(2) 

+ 2E [{X - Pioic,xiX),PToic,x{X) -E[X])] . 
" V ' 

: = (3) 

Now (3) = since Projc,x(^) = ^ [X\P^o}c,x{^)] ■ ^ 

2 Backgrounds on theory of polytopes 

Let E he a d dimensional vector space and E* its dual. 

Definition 5 (fc-flat). A k-flat is a k-dimensional affine subspace E. 

Definition 6 (convex polyhedron and convex polytope). A convex polyhedron is the intersection of a 
finite subset of closed halfspaces. If it is bounded, it is a convex polytope. 

Definition 7 (cell). A cell is the intersection of a finite set of flats and open halfspaces. And thus, 
equivalently, it is the relative interior of a convex polyhedron. If R C E, we denote cell(i?) the relative 

interior of the convex hull of R. 

Definition 8 (simplex). A simplex is cell(i?) where R is a set of affinely independent points. 

• A 2-dimensional simplex is the interior of a triangle. 

• A 3-dimensional simplex is the interior of a tetrahedron. 

Definition 9 (circumsphere). A circumsphere of a set R C E is a sphere S of E such that R C S. 

Definition 10 (supporting halfspacc). Let C be a convex subset of E. A hyperplane H supports C if 

iJ n C ^ and C is contained into one of the closed halfspaces defined by H . 

Lemma 2.1. Let C <^ E be a convex subset of E. If H is a supporting hyperplane of C, then every 
point of H (iC is a frontier point of C. 

Proof: Let H he a supporting hyperplane of C of equation (j){x) = a. Consider v € E such that 

\/x € E (j){x) = {x\v). 

Consider a € H C\ C . We may assume that Vx e C 4'{x) = {x\v) > a. If a does not belong to the 
boundary of C, 3e > 0, B{a, e) C C so for any A > small enough, a — \v & C and 

a < 4>{a — \v) = {a\v) — A||w|p < {a\v) = a 

which yields a contradiction. Consequently a G dC. □ 
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Corollary 2.2. Every point of the boundary of a convex subset of E belongs to one of its supporting 
hyperplanes. 

Proof: The proof is straightforward using the same approach as for the previous lemma. □ 

Lemma 2.3. If C is a non empty closed convex subset of E , distinct of E , then every point of the 
boundary dC belongs to a supporting hyperplane of C . 

Proof: a G dC Vfc £ N*, 3xfc e i? (a, ^) ,0;^ ^ C. We denote y^ = Pc{xk) the projection of Xk on C, 
= nf^^—ji • Owing to the characterization of the projection on a closed convex subset, we have 

<o 

, ^ ^ 

Vz G C, {xk - pc{xk),Xk - z) = \xk - pc{xk)\ - {xk - pcixk),z - pc{xk)) 

> \xk ~ Pci^k}]'^ > because Xk ^ C. 

Every vector Zk lying on the unit sphere of E (which is compact), one can extract a subsequence of ^^(j;) 
that converges to a vector v, with \v\ = 1. As {xk)i<k converges to a, by continuity of pc and of the 
scalar product, we have 

Wz eC,{v,a- z) ^ lim (zm) , xm) - z)>0. 
k-^+00 

In other words C is contained in the halfspace {z G E,{v,a — z) > 0}. Moreover, as a is in the 
corresponding hyperplane H , H is 'A supporting halfspace of C . □ 

Definition 11 (face). A face of a convex polyhedron P is the relative interior of the intersection of a 
hyperplane supporting P with the closure of P. 

Proposition 2.4. Let P be a convex polyhedron, a face of P is a cell, and a face of a face of P is a face 
of P. 

Definition 12 (fc-face). A k-face is a a face whose affine closure has dimension k. 

Definition 13 (cell complex). A cell complex is a finite collection of pairwise disjoint cells so that the 
face of every cell is in the collection. 

Definition 14 (opposite fc-faces). Two distinct k-cells of a cell complex are opposite if they have a 
common [k — I) -face. 

Definition 15 (triangulation). Let S be a finite point set of E. A triangulation T of S is a cell complex 
whose union is the convex hull of S and whose set of Q- cells is S. 

Definition 1151 is a non standard definition because cells are not required to be simplices. This formalism 
is due to Steven Fortune [4J. 

Definition 16 (proper triangulation). A proper triangulation is a triangulation whose all cells are 
simplices. 

Any triangulation can be completed to a proper triangulation by subdividing non simplicial cells. 

2.1 Voronoi diagrams and Delaunay triangulations 
Voronoi diagram 

Let E he & d-dimensional Euclidean space, and S a finite subset of E. In the following, elements of S 
will be called sites. 

Definition 17 (Voronoi cell). For a nonempty subset of S, R C S, the Voronoi cell of R, denoted V{R) 
is the set of all points in E that are equidistant from all sites in R, and closer to every site of R than to 
any site not in R. 

Proposition 2.5. • Clearly, is r E S , V{{r}) is the set of all points strictly closer to r than to 
any other site. In particular, it is the interior of the Voronoi slab associated to r in S . (See the 
definition of a Voronoi slab in Section]^) 
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• V{R) may be empty. 

• Any point of E lies in V{R) for some R G S . 

Definition 18 (Voronoi diagram). The Voronoi diagram V is the collection of all nonempty Voronoi 
cells V{R) for RdS. 

Delaunay triangulation 

Definition 19 (Delaunay cell). If R <Z S, and V{R) is a non empty Voronoi cell, then the Delaunay 
cell D{R) is ce\\{R). 

Definition 20 (Delaunay triangulation). The Delaunay triangulation D of S is the collection of Delau- 
nay cells D{R), where R varies over subsets of S with V{R) non empty. 

Proposition 2.6 (Empty circumsphere property). For R C S , cell(i?) is a Delaunay cell if and only if 
there is is a circumsphere of R that contains no site of S\R in its interior. 

Proof: Such a circumsphere can be obtained with center an point in the Voronoi cell V{R). □ 
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Figure 1: Voronoi diagram and Delaunay triangulation of a data set S of size 10. We have C G 
^^({■si, S2}). So C is the center of an empty circumsphere of {si,S2}. The point C123 is the center of 
the circumsphere of the Delaunay triangle {si, S2, S3}. 

Theorem 2.7. Let S be a set of n points in E with Voronoi diagram V and Delaunay triangulation D. 
Then 

1. V is a cell complex that partitions E. 

2. D is a triangulation of S . 

3. V and D are linked with the following duality relation: 

For R, R' C S, V{R) is a face of V{R') if and only if D{R') is a face of D{R). 

4. V{R) is unbounded if and only if every site of R is on the boundary of the convex hull of S . 
We refer to [4J for a detailed proof. 
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Locality 

Definition 21 (locally Delauiiay). We consider two opposite d-cells cell(i?) and cell(-R') in a triangula- 
tion T with circumspheres C and C . cell(i?) and cell(i?') are locally Delaunay if R'\R is outside ofC. 
This is equivalent to R\R' outside of C . 

A triangulation is locally Delaunay if every pair of opposite d-cells is locally Delaunay. 

Lemma 2.8 (Delaunay and locally Delaunay). A triangulation is Delaunay if and only if it is locally 
Delaunay. 

We refer to [1] for a detailed proof. 

Definition 22 (General position). Let S he a nonempty finite set of sites in E. S is in general position 
if no d + 1 points of S are affinely dependent and if no d -\- 2 points of S lie on a common sphere. 

Definition 23 (Incircle list). In the following, if S is a finite nonempty set of sites, D is a Delaunay 
triangulation of S and x d E is a settle point, we call incircle list and denote ICLd{x) the set of d-cells 
of D whose circumsphere contains x. 

If S is in general position, no Delaunay cell of S is degenerate. Every cell of the triangulation is a simplex 
and for any R C S, V{R) has dimension d + 1 — \R\. 

Computing the Delaunay triangulation and the Voronoi diagram 

Whereas the Voronoi diagram was defined before the Delaunay triangulation, it has been recognized 
that it is easier to devise algorithms in terms of Delaunay triangulation, especially because of the locality 
property 12.81 

A common data structure for Delaunay triangulations is a graph structure where each simplex is a 
"node". The node contains the indices of the d + 1 sites of the simplex and the pointers to the adjacent 
simplices. Null pointers are used when the simplices lie on the boundary of the triangulation. Cells of 
lower dimension are not directly represented in the graph structure. Another convenient convention is 
that the fcth pointer stored in the node corresponds to the facet obtained by deleting the fcth site in the 
node. Moreover the order is chosen so that the orientation of every simplex in the triangulation remains 
always positive. 

Here, we present the principles of incremental algorithms for Delaunay triangulations. In this kind of 
algorithms, sites are added one by one, and the Delaunay triangulation is modified to include each new 
site. Many other algorithms have been designed for computing the Delaunay triangulation, especially 
in dimension 2. Moreover, computing the Delaunay triangulation of the Voronoi diagram in the one- 
dimensional case simply amounts to sorting the data set. An advantage of incremental algorithms is 
that they are valid in any dimension. Moreover, for another purpose in the following, we will need a new 
algorithm (the friend node algorithm presented in Section [S]) that requires a stage which is very similar 
to the insertion of a new point in the Delaunay triangulation. Hence we will focus here on incremental 
algorithms. 

Let iS* = (si, • • • , spf) be a nonempty finite set of sites of E of cardinal N. We define the sets Sk ■= 
(si, • • • , Sfc) for fc e {1, • • • , N}. Now, for a settled i < N, let us consider Di the Delaunay triangulation 
of Si. We inspect the situation of s^+i with respect to the Delaunay triangulation Di. From this analysis, 
the Delaunay triangulation will be modified locally to build a new Delaunay triangulation D^+i of Si+i. 
When all the sites of S will be processed, we will have the complete Delaunay triangulation D of S. 

Three situations can occur, if S is in general position: 

1. Si+i lies in the interior convex hull of Si. 

2. s,;+i does not lie in any circumsphere of any simplex of Di. 

3. s,;+i lies outside of the convex hull of Si but belongs to a circumsphere of a simplex of Di. 

(1) In the first situation, let denote S := /Ciu. (s^+i) and Fi, - ■ ■ ,Fp the external faces of S of any 
dimension k < d. We can show that the cell complex defined by 

A+i {Di\S) U { cell(^; , s,;+i)„ 1 < j < p} U { {s,+i} } 
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is the Delaunay triangulation associated to Si^i. In a more general setting, we have the fohowing 
property: 

Proposition 2.9 (star-shaped incircle hst). Let S be a nonempty finite set of sites of E and x £ E that 
lies on the convex hull of S . Consider C the union of the d-cells of ICLd{x) and of all its faces. Then 
C is star-shaped from x, that is for any point p £ C, [x,p] C C . 

(2) The second situation is the simplest. If i^i, • • • ,Fp are the external faces of the triangulation Di (of 
any dimension k < d) that are visible from We can show that the cell complex defined by 



is the Delaunay triangulation associated to S'i+i. 

(3) In the third situation, if we denote S = ICLoi{si+i) the set of elements of Di whose circumsphere 
contains s^+i and -Fi,--- ,fp are the external faces) of this set which are not visible from Si+i and 
Fp+i, • • • , -Fp+g are the external faces of Di that are not faces of elements of S and that are visible from 
Xi+i. We can show that the cell complex defined by 



is the Delaunay triangulation associated to S'i+i. 

The first triangulation Dd+i is made of a simple simplex defined by the d + 1 first inserted points. 

One important modification of the incremental algorithm consists in inserting sites in a random order. 
Its expected running time is better than the worst case running time for the incremental algorithm. 
The worst case complexity of computing the Delaunay triangulation of n points in a d dimensional 



On the practical implementation 

The first step is the Localization. It consists in finding whether the new site x is in the convex hull of S 
or not, and if it is the case, in what Delaunay cell of the triangulation Tg x lies. A survey on localization 
methods is available in [2]. When x is inside of the convex hull of S', the localization procedure return 
the index of the the Delaunay cell where it lies. This corresponds to the situation (1). When x is outside 
of this convex hull, the localization returns a Null pointer. This corresponds to the situations (2) or (3). 

The second step consists in finding the list of the Delaunay cells whose circumsphere contains x (the 
incircle list). In the situation (1), this list contains at least the Delaunay cell where x is located. Owing 
to the Proposition 12.91 we know that the union of these Delaunay cells is star-shaped so that it can be 
determined locally by testing connected cells in the graph structure presented above. 

The last step consists in deleting the Delaunay cells of the incircle list and connecting the new site to the 
external faces of the incircle list or the visible faces of the convex hull of S depending on the situation 



(1), (2) or (3). 

3 Classical examples of fast nearest neighbor search algorithms 
in low dimensions 



Given a set of n points, {a;i,--- ,a;„} C E, the nearest neighbor problem is to find the point that is 
closest to a query point q € E. Many algorithms have been proposed to avoid the large computational 
cost of the obvious brute force algorithm. When one has to perform a big amount of nearest neighbor 
searches, a preprocessing of the data set will be profitable if it reduces the average query time. 

The problem is optimally solved in the case of dimension 1, where the best algorithm is, as a prepro- 
cessing to sort the data set by the unique coordinate of its points. (Approximative cost of 0{n\n{n))). 
The search algorithm consists of a simple binary search whose cost is + 0{l). 

In the case of low dimensions, most fast search algorithms still have an approximative preprocessing 
cost of 0{n\og{n)) and an average search cost in 0(log(n)) in low dimension. The criterion of choice 
among them relies on 



A+i A U { ce\\{Fj,s,+i)j,l < j < p} U { {s,+i} } 



A+i := (A\5) U { ce\\{F,,s,+i)j, 1 < j < p} U { {s,+i} } 
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• their effective speed on real data sets, 

• the required memory, 

• the sensitivity of the speed to the dimensionality. 

A first obvious optimization called partial distance search (P.D.S.) consists of a simple modification 
of the brute force search: during the calculation of the distance, if the partial sum of square differences 
exceeds the distance to the nearest neighbor found so far, the calculation is aborted. This almost always 
speeds up the nearest neighbor search procedure. 

3.1 The Kd-tree algorithm 

The Kd-tree algorithm is the archetype of the branch-and-bound nearest neighbor search tree. It is very 
popular because of its simplicity. 

Building the tree: 

• Every point of the data set is associated to the root node. 

• The data set is being sorted by its first coordinate. Then it is divided in two subsets of cardinal 



• Each subset is associated to a child node of the root node. 

• The process is repeated on each child node recursively using the coordinate axis in a cyclic order, 
until there are less than two points in each node. 

Searching in the tree: Let q be the query point. 

• The search procedure begins by searching in what child node q is (depending of its first coordinate) . 

• This child node is then searched, and the process is repeated recursively until a terminal node is 



• A trivial nearest neighbor search is performed in the terminal node. (Partial Distance Search 
optimization can be used.) 

• The procedure moves up to the parent of the terminal node. 

• If the distance c?2 between q and the hyperplane that splits the data set is smaller than the distance 
dmin to the nearest neighbor found so far, the other child node is searched. 

• The procedure continues its way back to the root node. 

Complexity: Except in one dimension where the search complexity is logarithmic (it amounts to a 
binary search), the worst case of the Kd-tree corresponds to the case where every node of the tree is 
explored. Then the worst case complexity is time exponential. The distances to every point is computed. 
The complexity of the preprocessing is 0{d x nlog(n)). 

3.2 The principal axis tree algorithm 

The Principal Axis Tree (PAT) is a generalization of the Kd-tree proposed by McNames in [TU]. Instead 
of using a coordinate axis to sort the data set, its principal axis is used at each step. Moreover, the 
number of child node in the tree can be greater than 2 at each generation. 

Building the tree: 

• Every point of the data set is associated to the root node. 

• The data set is being sorted by its projection on its principal axis. Then it is partitioned in ric 



LtJ + 1 or 




reached. 



subsets whose cardinality is — 




-I- 1 or 




9 



Figure 2: K-d tree elimination condition: if the distance d2 between the query point q and the brother 
node is smaller than the distance dmin to the nearest neighbor found so far, say p, the brother node has 
to be explored. 



• Each subset is associated to a child node of the root node. 

• The process is repeated on each child node recursively until there are less than ric points in each 
node. 

• At each step, the principal axis, and maximal and minimal values of subset's projection on the 
principal axis are kept in memory. 

Optimizing the elimination condition: 




Figure 3: Elimination condition of the principal axis tree. 



We refer here to Figure [3] We can improve the lower bound to the points that belong to child nodes of 
brother nodes. For any point q in region 1 and x in region 2, we have cP{q,x) > d^2 + ^2£c- This result 
is then used again to get a lower bound to points in region 3, and 4 and so on. 

dl^ > (^23 '^^ S Region 3, 

x) > + ^23 + ^x G Region 4. 

Searching in the tree: Let q be the query point. 

• The search process begins by searching in which child node q is (by computing its projection on 
principal axis). 

• This child node is then searched, and the process is repeated recursively until a terminal node is 
reached. 
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• A partial distance search is then performed in the terminal node. 

• The procedure moves up to the parent of the terminal node. 

• The elimination condition is checked to decide if brother nodes have to be searched or not. 

• The procedure continues its way back to the root node. 

Choice of parameter Uc'. For normal or uniform random data sets (and distribution of query points), 
best overall performances are obtained with ric = 7 (independently from dimensionality for d < 10). 
(The same optimal value is obtained by McNames in [T^.) In the case where the data set is an optimal 
quantizer of those distributions, best performance is obtained with Uc — 13. 

Complexity: Space storage is 0{n). Except in the one-dimensional setting where the search complexity 
is logarithmic (it comes to a binary search) , the worst case of the Kd-tree corresponds to the case where 
every node of the tree is explored. Then the worst case complexity is time exponential (2" comparisons 
of coordinates), n distances are computed. The complexity of the preprocessing is 0{d x nlog(n)). 

Algorithm performance: On a 5000 points Gaussian data set in M^, the depth of the tree is 4. 

• 27 (partial) distances, 

• 15 scalar products, 

• 9 binary searches 
are performed in average. 

Why using this space partitioning ? The idea is that good empirical performance of PAT are due 

to the fact that it takes advantage of the shape of the data set. Yet obviously when both query point 
distribution and data sets lie on a smaller dimension (fc < d) subspace of E, one retrieves the same 
complexity as when using the same algorithm on a k dimensional space. This intrinsic dimension is often 
less than the spatial dimension of the space. In a more general setting, PAT takes advantage of high 
correlations in the data set coordinates. 

However if one uses the same number of child nodes Uc in Kd-tree and PAT tree, we see that 

• Preprocessing time is longer for PAT than for Kd-tree. 

• The first traversal of the tree to a terminal node is more costly (projections have to be computed). 

But PAT is still faster because its geometrical partition of the space fits the data set in a more relevant 
way. To be precise, it happens less often than one has to search a brother node with PAT than with 
Kd-tree. 

In IS) , the same space decomposition was proposed for the nearest neighbor search problem (but using 
the only 2 child node at each generation). They justify the use of this decomposition using a heuristic 
criterion, according to which the best possible decomposition of the data-set into two subsets for branch 
and bound nearest neighbor search is to split the data set with respect to its projection on the principal 
axis. 

4 A new quantization based tree algorithm 

As we have seen in previous sections, a good space decomposition that fits to the data distribution may 
lead to a faster branch and bound nearest neighbor search algorithm, if less brother nodes have to be 
explored. The traversal of the tree can be a little more expensive if it is compensated by the gain due to 
the fact that less nodes are explored. 

Principal component analysis and optimal quantization are two types of projection of a probability 
distribution. Similar inertia decompositions hold in the quadratic case (Huyghens lemma). 

PAT is based on a recursive space decomposition based on the principal component analysis of the 
underlying data set. The initial idea here is to design a branch and bound algorithm based on a recursive 
quantization of the empirical distribution of the underlying data set. 
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4.1 The crude quantization tree algorithm 
Building the tree: 

• Every point of the data set is associated to the root node. 

• The data set is being partitioned into ric subsets corresponding to the Voronoi cells of an optimized 
quantizer of the empirical distribution of the data set. 

• Each subset is associated to a child node of the root node. 

• The process is repeated on each child node recursively until there are less than a certain number 
of points in each node. 

Some other computations are done during the preprocessing that will be detailed further on. 
Remark. One notices that the resulting search tree is not balanced and may have some longer branches. 
Searching in the tree: Let q be the query point. 

• By performing trivial nearest neighbor researches in the node's quantizer the search algorithm 
traverses the tree to a terminal node where a trivial partial distance search is performed. 

• The procedure moves up to the parent of the terminal node. 

• The elimination condition, (developed further on) is checked to decide whether brother nodes have 
to be searched or not. 

• The procedure continues its way back to the root node. 

Consistency of the space decomposition: 

Implementing only the way down to the terminal node (with rtc = 7 in both principal axis tree and 
quantization tree), we naturally do not obtain always the index of the nearest neighbor. But we have 
noticed that the result is more often the right one with the quantization tree than with the principal 
axis tree. 

For instance, in dimension 2, on a 5000 points Gaussian data set, on a million Gaussian query points, 
we notices: 

• 56 percent of false results with PAT. 

• 16 percent of false results with the quantization tree. 

Similar results are obtained with other values of the parameters and other data set distributions. This 
empirical test makes us reasonably optimistic about the performance of a branch and bound tree based 
on this decomposition. 

Still, the cost of the way through the search tree is more expansive with the quantization tree (as 
described above). 

• For the "quantization tree", we have to perform trivial nearest neighbor search to find the right 
child node. 

• For "principal axis tree", we only compute a projection and perform a binary search. 

Moreover, it was proved in [3] that in the case of Gaussian distributions, the afHne subspace spanned 
by stationary quantizers correspond to the first principal components of the considered Gaussian dis- 
tribution. (This result, extended to the infinite dimensional case in [S] allows to efficiently compute 
optimal quadratic quantizers of bi-measurable Gaussian processes.) Hence, in this case, this shows that 
the quantization tree with two branches at each generation is related to the principal axis tree. 

First elimination condition If the center of the Voronoi cell corresponding to the current node is A, the 
first rough method to decide whether a brother node with center B has to be explored or not is compute 
the distance d2 of the query point Q to the Leibniz halfspace H{B,A). Then the node corresponding 
to point B is explored if c?2 is smaller than the distance to the nearest neighbor found so far, c?i. We 
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QA^ + AB"^ - 2AQABcoi,a so that ^ cos a 



have d2 — — AQcosa and QB — k^ji i jij^ - ^^n^^u^ ^.^olx uhcuu .^woij: — 2Aqab 

This yields ^2 = '^^2AB^ ■ Hence, the computation of the distance to the Leibniz halfspace requires 
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one subtractions QA'^ — QB^, {QA^ and QB^ can be computed during the search in the quantizer in the 
parent node), and one multiphcation by jXb" (zJb computed during the preprocessing.) 

Then, it is clear that the nearest brother node correspond to the second nearest neighbor in the quantizer, 
and the second nearest to the third nearest neighbor, and so on. Hence, brother nodes have to be explored 
in the order defined by the distances of its centers the query point. 

We can also use the same optimization of the lower bound proposed by McNames in jlOj and presented 
in Section 13.21 Referring to Figure SI the lower bounds di are recursively incremented when exploring 
brother nodes. 




Figure 4: Optimization of the elimination condition for the quantization tree > d^ 



d\ 



dl 



Performance of this first quantization tree algorithm. This first algorithm has been implemented 
and its empirical performances has been compared to the two previously exposed PAT and Kd-tree in 
terms of empirical performances. 

Intermediate performances between our implementations of Kd-tree and PAT were obtained in small 
dimensions. Although, as we will see further in empirical tests, it seems to take better the increase of 
dimensionality. The preprocessing time, that requires small quantizer computations is also more costly 
than both PAT and Kd-tree. 

4.2 Optimizations for the quantization tree 

To reduce the average query time, we are now proposing a new optimization procedure which reduces 
the number of brother nodes to be checked. 



2 
1 
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Figure 5: CeU B is "hidden" from cell A. 
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Let us consider the Voronoi diagram plotted in Figure \5\ In this figure, we obviously know that when 
the query point is in a cell A, its nearest neighbor cannot be in cell B, because cell B is "hidden" by 
closer cells. One has to give a precise mathematical sense to "hidden" in this sentence. However, in the 
quantization tree as it has been described, the distance of query point to H{a, b) has to be computed. 

A first idea is to compute for each 1 < i < ric a. list of "friends" among brother nodes in which the 
nearest neighbor can be when q is in cell i. 

This list has to be large enough to ensure that it contains the nearest neighbor but as small as possible 
in order to reduce the computations of elimination conditions. 

As concerns the choice of the parameter Uc, we have to take in consideration that increasing makes 
the depth of the tree smaller but also makes the nearest neighbor search slower for each generation of 
the search tree. 

How can we obtain a friend Voronoi slabs list? The first observation about obtaining such a 
friend list is that it is not a simple problem. Indeed, this list is a priori not reduced to slabs whose 
corresponding Voronoi cells are adjacent in the Voronoi diagram. Moreover, in some cases, the minimal 
friend list can be quiet large. So is the case for unbounded Voronoi slabs for example. 




Figure 6: In these cases, the nearest neighbor of the query point q may be p although p is not in an 
adjacent Voronoi cell. 



5 Some optimizations for the quantization tree algorithm 

In Section [2. 1[ basic definitions about Voronoi diagrams and Delaunay triangulations that are prerequi- 
sites to this section ahev been recalled. 

Remark (Voronoi slabs and Voronoi cells). From their respective definitions, one can easily deduce the 
following properties: 

• Let S <Z E be a finite set of sites, let C be an associated Voronoi partition and consider s £ S . 

o 

Then it is clear that V{{s}) = slabc(s). 

• The points of the Voronoi cells V{R) with R C S and cardi? > 1 belong to the boundaries of 
Voronoi slabs. 

• As a consequence, for s G S , as the boundary V{{s}) is constituted with its faces of lower dimen- 
sions, previous remark yields V{{s}) — slab(s) and Sslahsis) — dVs{{s}). 

Notations: In the following of this section, if C i? is a finite set of sites in E, one will denote T5 
the Delaunay triangulation of S, DGs the Delaunay graph of S, Vs its Voronoi diagram. For B C S, 
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Vs{R) will represent the Voronoi cell of R in S. If Cs is a Voronoi partition associated to S, and s G S, 
slabs (s) will denote the Voronoi slab associated to S is the Voronoi partition C. 

Definition 24 (Leibniz halfspace). For (a, &) e i?-^ let us denote H{a,b) :— |x e R'^Ha; — a| < |x — 6|| 
the Leibniz halfspace associated to {a,b). 

Proposition 5.1. An obvious property is if S is a finite set of sites of E, and p S , 

vs{{p})^ n ^(P'*)- 

Proposition 5.2. If S is a finite set of sites of E, and p € S, Vs{{p}) = f~] H{p,s). 

{s,p}£DGs 

Lemma 5.3. Let S d E be a, nonempty finite set of sites in E and x G E\S. Consider s G 5, the 
following assertions are equivalent: 

1. {x,s} G DGsyj{x}- 

2. Vs{{s})r^Vs^{.}{{x})^^. 

3. Vs{{s})r\H{x,s)^%. 
Proof: 

• (1. 2.) Assume that {x, s} G DGs\j{x} then by definition, it is equivalent to Vs\j{x}{{x, s}) ^ 0. 

Vs\j{x}{{xt s}) is (d— l)-faceof V5u{3;i,(x). Moreover, by definition of Voronoi cells, Vs\j{x}{{xt s}) C 
Vs({s}), which is open. As a consequence, Vy G Vs\j{x}{{xt s}), Ve > 0,B{y,e) fl Vsu{x}{x) 7^ 0- 
And for small enough e, B{y,e) C Vs{{s}). We can conclude that Vs{{s}) fl Vsu{x}{{x}) ^ 0- 

• (2. => 3.) is obvious owing to Proposition l5.ll 

• (3. 1.) If y G Vs{{s})nH{x,s), let us show that Vsu{x}{{x, s}) ^ 0. 

Consider the segment [s, y]. By convexity, [s, y] C V^ds}). Thus every point of [s, y] is closer to s 
than to any other point of S. On the other hand, it can either be closer to s than to x, or closer 
to X than to s or at the same distance. 

We now define the applications / : [0, 1] — > [s, y] C E hy /(A) = As + (1 - A)y and A : i? -!> M by 
A{p) = d{p,x) - d{p,s). 

A o / is a continuous function with A o /(O) > 0, A o /(I) < 0. The intermediate value theorem 
shows that there exists A* such that A o /(A*) = and thus /(A*) G Vsu{x}{{x, s}). □ 

The first modification made in the quantization tree algorithm is to assume that the points of the 
quantizer at each generation are points of the underlying codebook F. (In order to fulfill this requirement, 
we project an optimal quantizer onto the codebook.) 

Corollary 5.4. Let T — {Fi,-- - ,F„} be a codebook of E. S = {si,--- ,Sp} C. T be subset of T . 
Let Projp be a nearest neighbor projection on F. F is being partitioned into p subsets F^, • ■ • , F^ with 
Fi = F n slabs (si), i'V their nearest neighbor projection on S . 
Consider q € E. If q & slabs(s) and t = Projp(s) then {t, s} G DGsij{t}- 

Proof: This is a straightforward consequence of the previous lemma. □ 

Notation: Let 5* be a set of sites in E. For a point t in E, we denote PIs{t) = |s G 5, {s, t} G DG svj{t}^ ■ 
The notation PI stands for "Pseudo-Insertion". 

From an algorithmic viewpoint, the Delaunay graph of S being computed, PIs{t) stands for the sets 
of points in S', that are connected to t when updating the Delaunay graph to take account of this new 
point. 

Implementing a procedure that computes Pis {t) is very similar to the insertion procedure of point t 
in Ts- 

First friend node algorithm: This leads to a first method to compute a friend list: 
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Figure 7: If the query point q lies on the dark grey region H{x, s) Ci Vs{{s}) its nearest neighbor may be 

X. 



For every point p of the underlying codebook, 

• Compute s — PTO}g(j)) and PIs{p). 

• Then for every point s' G PIs{p), insert s in the set of friends of node s' . 



This method gives a first algorithm to compute friend list. Still, when the data set is large, it is very 
expensive because one has to deal with all the points of the data set. 

In fact it is possible to compute an acceptable friend list thanks to the same result [5751 without using 
the points of the underlying data set. 

Fast friend node algorithm: In this section, another method to compute friend node lists is devised 
which does not need to deal with the complete underlying data set but only the underlying codebook. 

When keeping the same notations, the principle of the method is to compute for every slabs (s), s G S* 
,of the Voronoi partition Cs, the set UPIs{s) := IJ PIs{p). It is the union of all the pseudo-insertions 

pGslabs(s) 

of points of slabs (s). If one is able to compute this set, the resulting friend nodes algorithm simply 
writes: 

For every point s E S, 

• Compute UPIs{s). 

• Then for every point s' G UPIs{s), insert s in the set of friends of node s' . 



The question is: how can we compute UPIs{s)7 

Lemma 5.5. With the same notations, one has UPIs{s) = [J PIs{p)- In other words, we have to 

p^5 slab^ (s) 

checkpoints of the boundary 6 slabs (s) o/slabs(s). 

Remark. Let us recall that, thanks to Provosition \2. 51 ((5slabs(s) = dVs{{s}). 

Proof: Consider x G slabs(s) such as s' G PIs{x). Let us define x* , such that {x*} — [x, s'] n dVs{s). 

• One has H{x*,s') D H{x,s'). So Vs{{s'}) H H{x* , s') Z) Vs{{s'}) n H{x, s'), hence Vsi{s'}) H 
H{x,s') 7^ ^ Vs{{s']) n H{x*,s') ^ that is equivalent to s' G PI{x*) thanks to the Lemma 

• Finally, Va; G slabs(s),Vs' G PIs{x),3x* G (5slabs(s) such that s' G PIs{x*). □ 
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Remark. As there are not a finite number of sites on the boundaries, this does not give an effective 
method for computing UPIs{s) yet. 

As seen in Section 12.11 computing the set Pis (x) corresponds almost to the same algorithm as the 
insertion procedure in an incremental triangulation algorithm, that is: 

• Localization of x in the triangulation, 

• Computation of the set ICL{x), 

• UIs{x) is the set of points that belong to a cell of ICL{x) plus, if x is outside the convex hull of 
S, the points of the external faces of Tg that are visible from x. 

Lemma 5.6. Let S be a non empty finite set of sites in E. We consider the circumsphere C of Delaunay 
d-cell of the Delaunay triangulation Ts- We denote c its center and r its radius. Let s be a site of S. 
IfVsiis}) C C 7^ then c + ^(s - c) e Vsi{s}). 

The proof is straightforward. This leads to an algorithm to compute sets {UPIs{s))ses- 

• For every Delaunay d-cell D of Ts 

— Compute the center c and radius r of its circumsphere. 

— For every site s G S" that is not in D, compute p :— c + ^ G ({•*}); and check if the 
site s is the nearest neighbor of p in S. If so is the case, then the points of the Delaunay 
d-ceUs D belong to UPIsis). 

• Then deal with unbounded Voronoi cells: 

— For every external face F of the Delaunay triangulation. compute a normal vector up directed 
toward the exterior of the convex hull of S. 

— For two distinct external faces Fi and F2 of the Delaunay triangulation, if {up-^,UF2) > 
then for every (si,S2) e Fi x F2, si e UPIs{s2) and S2 G UPIs{si). 



In Figure [5J we present some friend Voronoi lists in the 2-dimensional case. 




Figure 8: Examples of friend Voronoi cells in a two-dimensional Voronoi diagram in the case of a bounded 
Voronoi cell (left) and in the unbounded case (right). In both case, the dark gray region is the considered 
Voronoi cell and the light gray regions are the friend Voronoi cells. 
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6 Test with real data sets 



To perform the following tests, the quantization tree algorithm and the friend-node optimization have 
been implemented in CH — h. Because of the additional feature related to computational geometry that 
we needed, as the pseudo-insertion procedure, we had to implement a Delaunay triangulation. All the 
figures presented in this article were generated with this implementation of the Voronoi diagram with 
which we performed the following tests. 

6.1 Tests on Gaussian and uniform data sets 

In Tables [TU] and we report the execution time for 10 millions nearest neighbor queries on data- 
sets of size 5000 generated with independent Gaussian pseudo random variables and with a uniform 
distribution on the hypercube. The best overall performances were obtained with Uc = 35 children by 
node for the quantization tree. The tests were performed with an Intel Pentium Dual CPU at 2GIIz. We 
noticed that in dimension d — 2 and d = 3, we had intermediate performances between the "principal axis 
tree" and the Kd-tree algorithms. In dimension 4, the performance of the "principal axis tree" and the 
"quantization tree" are close one to each other. Finally, it seems that the quantization tree has a better 
behaviour in dimensions greater than 5 where it significantly outperforms the two other implemented 
methods. 
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Figure 9: Execution time of 10 millions random queries on a data set of 5000 points, generated with a 
Gaussian pseudo random generator. 
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Figure 10: Execution time of 10 millions random queries on a data set of 10000 points, generated with 
a Gaussian pseudo random generator. 
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Figure 11: Execution time of 10 millions random queries on a data set of 5000 points, generated with a 
uniform pseudo random generator. 



Remark (Computational cost or the preprocessing for the friend cell algorithm). An important fact that 
we have experienced is that, in higher dimensions, the friend cells list becomes bigger and there is no 
more competitive advantage in using it in dimension higher than 7 (when having less than 30 branches 
per generation in the quantization tree). Moreover, as it requires to compute Delaunay triangulations 
during the preprocessing, whose complexity exponentially increases with the dimension, the computational 
cost of the friend cell preprocessing makes it useless in higher dimensions. 
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